Add project clone retries functionality#1613
Conversation
Signed-off-by: David Kwon <dakwon@redhat.com> Assisted-by: Claude Code Opus 4.6
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkwon17 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdds retry logic with exponential backoff to the initial git clone flow, cleans up failed temp clones between attempts, preserves underlying errors on final failure, and exposes retry count via a Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
project-clone/internal/global.go (1)
68-76: Consider adding an upper bound forCloneRetries.The validation rejects negative values but allows arbitrarily large retry counts. With exponential backoff, high values like
CloneRetries=30would result in a delay of ~17 years before the final retry attempt (1s * 2^29).Consider capping at a reasonable maximum (e.g., 10) to prevent configuration mistakes from causing excessively long waits.
Suggested change
+const maxCloneRetries = 10 + CloneRetries = defaultCloneRetries if val := os.Getenv(cloneRetriesEnvVar); val != "" { parsed, err := strconv.Atoi(val) - if err != nil || parsed < 0 { - log.Printf("Invalid value for %s: %q, using default (%d)", cloneRetriesEnvVar, val, defaultCloneRetries) + if err != nil || parsed < 0 || parsed > maxCloneRetries { + log.Printf("Invalid value for %s: %q (must be 0-%d), using default (%d)", cloneRetriesEnvVar, val, maxCloneRetries, defaultCloneRetries) } else { CloneRetries = parsed } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@project-clone/internal/global.go` around lines 68 - 76, The env parsing for CloneRetries (using cloneRetriesEnvVar and defaultCloneRetries) only rejects negatives but allows huge values; add an upper bound (e.g., maxCloneRetries = 10) and when parsed > maxCloneRetries log a warning and set CloneRetries = maxCloneRetries, otherwise set CloneRetries = parsed; keep the existing invalid-number fallback behavior and use the same log.Printf pattern to report out-of-range values.project-clone/internal/git/setup.go (1)
66-68: Use%wfor proper error wrapping.Using
%sformats the error as a string but loses the error chain, preventing callers from usingerrors.Is()orerrors.Unwrap()to inspect the underlying cause.Suggested change
if cloneErr != nil { - return fmt.Errorf("failed to clone project: %s", cloneErr) + return fmt.Errorf("failed to clone project: %w", cloneErr) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@project-clone/internal/git/setup.go` around lines 66 - 68, The error return currently formats cloneErr with "%s" which discards the error chain; update the return in the clone error path to wrap the original error using "%w" so callers can use errors.Is/errors.Unwrap (i.e., change the return that references cloneErr to use "%w" to wrap the error).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The error return currently formats cloneErr with "%s" which
discards the error chain; update the return in the clone error path to wrap the
original error using "%w" so callers can use errors.Is/errors.Unwrap (i.e.,
change the return that references cloneErr to use "%w" to wrap the error).
In `@project-clone/internal/global.go`:
- Around line 68-76: The env parsing for CloneRetries (using cloneRetriesEnvVar
and defaultCloneRetries) only rejects negatives but allows huge values; add an
upper bound (e.g., maxCloneRetries = 10) and when parsed > maxCloneRetries log a
warning and set CloneRetries = maxCloneRetries, otherwise set CloneRetries =
parsed; keep the existing invalid-number fallback behavior and use the same
log.Printf pattern to report out-of-range values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 62363f36-541d-466e-bc69-a145673dc401
📒 Files selected for processing (2)
project-clone/internal/git/setup.goproject-clone/internal/global.go
Signed-off-by: dkwon17 <dakwon@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The clone error return currently exits without removing the
temporary clone directory (tmpClonePath), leaving stale data; ensure
tmpClonePath is cleaned on failure by calling os.RemoveAll(tmpClonePath) before
returning (or by registering a defer cleanup immediately after creating
tmpClonePath) in the function that performs the clone (refer to the tmpClonePath
variable and the clone error handling block that returns fmt.Errorf("failed to
clone project: %w", cloneErr)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a72c18e9-e8fb-47ea-9cd8-f1a9f4fc692e
📒 Files selected for processing (2)
project-clone/internal/git/setup.goproject-clone/internal/global.go
| if cloneErr != nil { | ||
| return fmt.Errorf("failed to clone project: %w", cloneErr) | ||
| } |
There was a problem hiding this comment.
Final clone-failure path skips temp directory cleanup.
After the last failed attempt, Line 67 returns immediately and can leave partial repo data in tmpClonePath. Over repeated init-container failures, stale temp directories may accumulate under the mounted projects volume.
Proposed fix
if cloneErr != nil {
+ if err := os.RemoveAll(tmpClonePath); err != nil {
+ log.Printf("Warning: cleanup after final clone failure failed: %s", err)
+ }
return fmt.Errorf("failed to clone project: %w", cloneErr)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if cloneErr != nil { | |
| return fmt.Errorf("failed to clone project: %w", cloneErr) | |
| } | |
| if cloneErr != nil { | |
| if err := os.RemoveAll(tmpClonePath); err != nil { | |
| log.Printf("Warning: cleanup after final clone failure failed: %s", err) | |
| } | |
| return fmt.Errorf("failed to clone project: %w", cloneErr) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@project-clone/internal/git/setup.go` around lines 66 - 68, The clone error
return currently exits without removing the temporary clone directory
(tmpClonePath), leaving stale data; ensure tmpClonePath is cleaned on failure by
calling os.RemoveAll(tmpClonePath) before returning (or by registering a defer
cleanup immediately after creating tmpClonePath) in the function that performs
the clone (refer to the tmpClonePath variable and the clone error handling block
that returns fmt.Errorf("failed to clone project: %w", cloneErr)).
There was a problem hiding this comment.
Even if project-clone failed to clone, the workspace will still start. Having partial data in the tmp folder be useful for debugging.
There was a problem hiding this comment.
@dkwon17, that's a good point — keeping the partial clone data in the temp directory after failure aids debugging. I'll withdraw this suggestion.
✏️ Learnings added
Learnt from: dkwon17
Repo: devfile/devworkspace-operator PR: 1613
File: project-clone/internal/git/setup.go:66-68
Timestamp: 2026-04-10T19:50:55.397Z
Learning: In `project-clone/internal/git/setup.go`, leaving the partial/stale clone data in `tmpClonePath` after a final clone failure is intentional: the workspace still starts even when project-clone fails, and the leftover data in the temp directory is useful for post-failure debugging. Do not flag the absence of `os.RemoveAll(tmpClonePath)` cleanup on the final error return path as an issue.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Applies to test/e2e/**/*_test.go : In e2e tests creating DevWorkspace, MUST use 'DeleteDevWorkspaceAndWait' in cleanup (AfterAll or AfterEach) to delete workspace and wait for PVC cleanup. Using 'DeleteDevWorkspace' alone may leave PVC and cause conflicts.
Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Applies to controllers/workspace/**/*.go : When workspace fails, call 'failWorkspace()' with appropriate 'FailureReason' (ReasonBadRequest, ReasonInfrastructureFailure, ReasonWorkspaceEngineFailure, or ReasonUnknown)
Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Workspace bootstrapping is supported via 'controller.devfile.io/bootstrap-devworkspace: true' attribute when a devfile cannot be resolved but the project can be cloned. Flow: workspace starts with generic deployment, project-clone clones projects, finds devfile, DevWorkspace is updated with found devfile, workspace restarts with new definition.
Signed-off-by: dkwon17 <dakwon@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
project-clone/internal/global.go (2)
38-40: Use guideline-compliant constant naming for new retry constants.
cloneRetriesEnvVar,defaultCloneRetries, andmaxCloneRetriesare lower camelCase. Rename them to PascalCase or UPPER_SNAKE_CASE to align with the repo’s Go constant naming rule.As per coding guidelines: "Use PascalCase or UPPER_SNAKE_CASE for constant names."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@project-clone/internal/global.go` around lines 38 - 40, The constants cloneRetriesEnvVar, defaultCloneRetries, and maxCloneRetries use lowerCamelCase; rename them to follow the project's constant naming rule (PascalCase or UPPER_SNAKE_CASE) throughout the file and all usages—e.g., ProjectCloneRetriesEnvVar or CLONE_RETRIES_ENV_VAR, DefaultCloneRetries or DEFAULT_CLONE_RETRIES, and MaxCloneRetries or MAX_CLONE_RETRIES—update their declarations and all references (search for cloneRetriesEnvVar, defaultCloneRetries, maxCloneRetries) to the new names to ensure compilation and consistency.
18-31: Reorder import groups to match repository Go conventions.The import block currently places a project-local package before third-party imports. Please order as: standard library, third-party/Kubernetes, then project-local.
As per coding guidelines: "Organize imports into three groups separated by blank lines: (1) standard library, (2) third-party + Kubernetes, (3) project-local."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@project-clone/internal/global.go` around lines 18 - 31, The import block is misordered: move imports into three groups (standard library, third-party/Kubernetes, then project-local) separated by blank lines; specifically keep "errors", "fmt", "log", "os", "regexp", "strconv", "time" together as the first group, then place go-git imports ("github.com/go-git/...", "github.com/kevinburke/ssh_config") and any other third-party/Kubernetes packages as the second group, and place the project-local package "github.com/devfile/devworkspace-operator/pkg/library/constants" as the final group; update the import ordering in the file's import block accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@project-clone/internal/global.go`:
- Around line 38-40: The constants cloneRetriesEnvVar, defaultCloneRetries, and
maxCloneRetries use lowerCamelCase; rename them to follow the project's constant
naming rule (PascalCase or UPPER_SNAKE_CASE) throughout the file and all
usages—e.g., ProjectCloneRetriesEnvVar or CLONE_RETRIES_ENV_VAR,
DefaultCloneRetries or DEFAULT_CLONE_RETRIES, and MaxCloneRetries or
MAX_CLONE_RETRIES—update their declarations and all references (search for
cloneRetriesEnvVar, defaultCloneRetries, maxCloneRetries) to the new names to
ensure compilation and consistency.
- Around line 18-31: The import block is misordered: move imports into three
groups (standard library, third-party/Kubernetes, then project-local) separated
by blank lines; specifically keep "errors", "fmt", "log", "os", "regexp",
"strconv", "time" together as the first group, then place go-git imports
("github.com/go-git/...", "github.com/kevinburke/ssh_config") and any other
third-party/Kubernetes packages as the second group, and place the project-local
package "github.com/devfile/devworkspace-operator/pkg/library/constants" as the
final group; update the import ordering in the file's import block accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 65e38a1c-3820-48a8-bb76-6861f3d85b6d
📒 Files selected for processing (1)
project-clone/internal/global.go
Assisted-by: Claude Code Opus 4.6
What does this PR do?
Adds project clone retries functionality.
For the time being, by default, there will be up to 3 retries (4 attempts in total) with an exponential backoff. In a later PR, it would be nice to add a field in the DWOC to configure the number of retry attempts.
What issues does this PR fix or reference?
https://redhat.atlassian.net/browse/CRW-9779
Is it tested? How?
After building the image, install DWO:
I have tested two cases:
Verify that regular cloning is working as intended
Create the following DW and verify that project-clone was successful:
Successful project-clone init container:
Verify that retries are working as intended.
Create the following DW. Note that in this DW, the
devworkspace-operatorgit project has typos in the remote urls, which will cause failures during git clone.Verify in the project-clone init container logs, that retries have been logged:
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-pathto trigger)v8-devworkspace-operator-e2e: DevWorkspace e2e testv8-che-happy-path: Happy path for verification integration with CheSummary by CodeRabbit
New Features
Bug Fixes